首页> 外文OA文献 >Streaming Similarity Self-Join
【2h】

Streaming Similarity Self-Join

机译:流式相似性自我加入

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We introduce and study the problem of computing the similarity self-join in astreaming context (SSSJ), where the input is an unbounded stream of itemsarriving continuously. The goal is to find all pairs of items in the streamwhose similarity is greater than a given threshold. The simplest formulation ofthe problem requires unbounded memory, and thus, it is intractable. To make theproblem feasible, we introduce the notion of time-dependent similarity: thesimilarity of two items decreases with the difference in their arrival time. Byleveraging the properties of this time-dependent similarity function, we designtwo algorithmic frameworks to solve the sssj problem. The first one, MiniBatch(MB), uses existing index-based filtering techniques for the static version ofthe problem, and combines them in a pipeline. The second framework, Streaming(STR), adds time filtering to the existing indexes, and integrates newtime-based bounds deeply in the working of the algorithms. We also introduce anew indexing technique (L2), which is based on an existing state-of-the-artindexing technique (L2AP), but is optimized for the streaming case. Extensiveexperiments show that the STR algorithm, when instantiated with the L2 index,is the most scalable option across a wide array of datasets and parameters.
机译:我们引入并研究了在流式上下文(SSSJ)中计算相似性自联接的问题,其中输入是连续到达的无限制项目流。目的是找到流中相似度大于给定阈值的所有项目对。问题的最简单表述需要无限的记忆,因此这是棘手的。为了使该问题可行,我们引入了时变相似性的概念:两个项目的相似性随它们到达时间的不同而降低。利用此时间相关的相似性函数的性质,我们设计了两个算法框架来解决sssj问题。第一个是MiniBatch(MB),它使用现有的基于索引的过滤技术来解决问题的静态版本,并将它们组合在管道中。第二个框架Streaming(STR)向现有索引添加时间过滤,并在算法的工作中深度集成了基于新时间的范围。我们还将介绍一种新的索引技术(L2),该技术基于现有的最新索引技术(L2AP),但针对流情况进行了优化。大量的实验表明,当使用L2索引实例化STR算法时,STR算法是跨各种数据集和参数的最具扩展性的选项。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号